perm filename HTSWTS.MRC[UP,DOC] blob sn#816752 filedate 1986-05-11 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.DEVICE XGP
C00003 00003	←%3How to start WAITS
C00007 00004	%2FIXING THE SYSTEM:%1
C00013 00005	%2RELOADING WAITS:%1
C00033 00006	%2RESTARTING THE KA-10:%1
C00036 ENDMK
C⊗;
.DEVICE XGP
.!XGPCOMMANDS←"/PMAR=0";
.!XGPLFTMAR←216;
.PAGE FRAME 70 HIGH 100 WIDE
.AREA TEXT LINES 1 TO 70 CHARS 1 TO 100
.PLACE TEXT
.FONT 1 "BASL30";
.FONT 2 "BASI30";
.FONT 3 "BUCK75";
.FONT 4 "FIX25";
.FONT 5 "FIX13X";
.FONT 6 "NGR20";
.TURN ON "←{%α↓_#"
.AT "ffi" ⊂ IF THISFONT ≤ 2 THEN "≠"  ELSE "fαfαi" ⊃;
.AT "ffl" ⊂ IF THISFONT ≤ 2 THEN "α∞" ELSE "fαfαl" ⊃;
.AT "ff"  ⊂ IF THISFONT ≤ 2 THEN "≥"  ELSE "fαf" ⊃;
.AT "fi"  ⊂ IF THISFONT ≤ 2 THEN "α≡" ELSE "fαi" ⊃;
.AT "fl"  ⊂ IF THISFONT ≤ 2 THEN "∨"  ELSE "fαl" ⊃;
←%3How to start WAITS

%2FIND A WIZARD:%1

.BEGIN INDENT 5

Before you do anything, you should try to find a wizard.  Maybe there is
already one working on the problem -- if so, he will be very angry if you
disturb the machine.  Look for Martin Frost (3-2462, room 030d); or
(especially for network problems), look for Joe Weening (3-4202, room 360);
or, as a last resort, try Len Bosack's office (3-0445, room 040d).  If
necessary, call a wizard at home (but not in the middle of the night
unless it is an %2urgent%* problem, as defined below).

Phone numbers of wizards:
.BEGIN NOFILL; INDENT 0;

%4
Martin Frost (ME) 3-2462 (office, 030d)
		  9-856-1456 (home)
	          3-1266-192 (beeper, if no response to home phone)
		  9-329-8400 (Lucid, some weekday mornings)
Joe Weening (JJW) 3-4202 (office, 360); 965-8474 (home, don't call late at night)
Len Bosack (LB) (call only at last resort, if urgent) 326-1967 (home)
.END
.BEGIN INDENT 0;

%1For an %2urgent%* problem you can beep ME at %2any%* time, %2but it
better really be urgent!%*  A problem is %2urgent%* if it is continuing or
repeating (e.g., you can't reload, or a similar emergency) or if the
system has been up for over 200 hours.  %2A#plain system crash doesn't
count as urgent%*, unless you are unable to reload after following the
reloading instructions below.  Even wizards don't like being awakened in
the middle of the night (unless the system has been up for hundreds of
hours).

To call by beeper, dial 3-1266, wait for the beep, dial 192, wait for
the next beep, then %2describe the problem in 10 seconds or less%*.  Your
message is transmitted right then by radio.
.END

Sometimes a wizard will dial up the CTY to fix things from home (possibly
without your knowing it).  When this happens, he may need some local help
from you.  %2Stand by%1 in case he asks you to do something like check
memory lights.

If you can't get in touch with a wizard, you'll have to fix it yourself;
see the instructions below.  After you fix it, make a note in the log with
the date, time and description of the failure (include any message typed
out on the CTY).  %2Sign your log note (with your SAIL programmer name, if
any)%1.  Use the observed log format when making your entry.  Thanks.
.END

**********************************************************************
.SKIP 1
%2FIXING THE SYSTEM:%1

.BEGIN
⊗#Before trying to fix the system, see if there is a note near the end of
the log with special instructions.  If not, read below.

⊗#Many crashes are bug traps and will print a message %2followed%* by:

←%4Find a WIZARD or type "$P".  $ means ESC.  You're in DDT.%1

If it prints this and you can't find a wizard, try typing [ESC] %4P%1 and
a couple of [RETURN]s.  If you get monitor dots, you're in luck.  Type
%4BEEP%1 and [RETURN] to tell everybody the good news.  %2Don't forget to
log the crash!%1

If after you type %4$P%1 the same thing happens, try %4$P%1 again.  If it
happens repeatedly, you'll have to reload, so go to step 1 below.  Certain
errors, like %4Page Fail, PI in Progress%1, require a wizard's
intervention; without help, the routine for such an error will just retry
the losing instruction, which naturally will fail in the same way again.
Routines for some other errors are able to fix the problem or bypass it and get
the system running again automatically when you type %4$P%1.  So the thing
to do is to try %4$P%1 a few times (if once doesn't fix things) before you
give up and reload (but always try to find a wizard before typing %4$P%1
even once).

⊗#If the system gets a %4NXM%1 (non-existent memory error), you may have to
reset a hung memory (reloading won't work); see step 200 below for how to do that.
Sometimes even resetting the memory won't help; in that case the memories
may have to be reconfigured or fixed.  You should leave that for a wizard
to do.

⊗#If the machine has powered itself off, then the %4FAULT%1 light will be on
on the KL-10's console PDP-11 front panel (where it says "KL-10", that's
really a PDP-11).  Usually this indicates an air-flow problem in the cpu
or a tripped circuit breaker.  The cause of the fault will be indicated by
one (or more) of several indicator lights inside the back of the console
PDP-11 cabinet, at the bottom.  Before doing anything else, you should see
which indicator lights are on back there.  Usually it is %4AIR FLOW CPU%1
or %4CKT BKR TRIP%1.  Log the problem before continuing.  Then try
very hard to find a wizard.  Do %2NOT%1 power the system back on unless a
wizard tells you to do so!

⊗#If one of the messages %4?10 CLKOP%1 or %4?10 TTI%1 was printed on the
CTY, a memory may be hung or powered off, or the microcode
may be hung; try the command %4MC%* and [RETURN] to see if that helps.  If
not, you may have to reset a memory (see step 200 below) and/or reload
(but in any case, first seek a wizard!).

⊗#The message %4?10 CMD ERR%1 usually means that %4KLDCP%* is not working,
so it may have to be reloaded; see step 120 below for what to do.

⊗#If you see the error message %4CLOCK ERROR STOP, CRAM PARITY%*, then
there has been a parity error in the microcode that runs the KL.  You
should FIRST carry out step 140 below to record what the error was, THEN
reload the microcode in stop 105, and finally finish reloading WAITS at
step 5.

⊗#If no message was printed, or a message was printed which doesn't look
like any of those above, you will probably have to reload (if you can't
find a wizard).

⊗#If explicit instructions are given in an error message, follow them.
.END

**********************************************************************
.SKIP 1
%2RELOADING WAITS:%1

.BEGIN INDENT 5
1.  If there has been a power failure, go to step 105.

2.  Type %4↑X%1 (i.e., hold down %4CTRL%1 and type %4X%1).  The response
should be %4KLDCP%1 (or else it may echo simply as %4↑X%*).  If the
command typed in the next step doesn't echo, try this step again; then if
typing the next step's command still doesn't seem to work, go to step 100.

3.  Type %4SP%* and [RETURN].  This stops the KL-10 and records useful
information, including the PC, on the CTY for later perusal by a wizard.
If this command gets you the message %4?UCODE HUNG%*, then type the
command %4ALL%* and [RETURN]; this logs a few lines of information
so a wizard can figure out how the microcode was hung.  In either case,
go on to step 4 next.

4.  Type %4DS%1 and [RETURN].  (%6If it responds with "Struct?", type
RN and [RETURN].%*)  If %4DS%1 gives you the proper response of
%4DSKDMP%1 and a star (%4*%1), go to step 5.  If you get the message
%4LOAD DSKDMP - USE LD%1, then you'll have to load the DSKDMP bootstrapper
from DECtape into the PDP-11 by typing the command %4LD BOOT%1 and
[RETURN] %6(if for some reason you are trying to reload from the Ampex
disks instead of the DEC RP07s, then you'll have to have selected/mounted
a different DECtape and the command to use here is %4LD NBOOT1%1)%*.
After doing %4LD BOOT%1, start step 4 over again.  If you get the message
%4DEX ERROR IN DS%1, then perhaps there is a hung memory which needs to be
reset; check the memories and reset any hung one(s) according to step 200,
and then return to the beginning of step 4.  If you've tried all of the
appropriate suggestions in this step and %4DS%* still fails, go to step
105.  If you still get %4DEX ERROR%1 after starting over at step 105, then
there is probably a failing memory and you'll have to get help from a
wizard.

5.  Type %4WAITS%1 and [RETURN].  (%6In certain rare cases, when the
PDP-11 realtime clock isn't working to supply WAITS with the date and
time, the system may ask you for the current date and time; if so, please
be careful to enter them correctly.)%*  If the system reloads and starts
you are winning.  If the system doesn't start, you must get help.  %6If
(and only if!) the CTY says %4?10 CMD ERROR%* at this point, then you may
have to perform step 100c to reload KLDCP.%1  In any case, %2don't forget
to log the cause of the crash and the reload!%1

6.  Only perform this step if you reloaded KLDCP from DECtape somewhere
along the way to here.  If you reloaded KLDCP from DECtape and WAITS is
now running, then you need to reload KLDCP again, this time from the disk.
This is done simply by typing %411LOAD%* and [RETURN].  In several seconds,
you should see the message %4Stanford KLDCP - QMP/EN%* typed out.  (For more
details, see step 120.)

.END

.BEGIN INDENT 0
%2Don't come here unless directed to by the steps above.%1
.END

.BEGIN INDENT 5
100.  KLDCP is the PDP-11 console program.  It prompts with "%4>.%1"
(a greater-than sign and a dot).  By typing carriage return, you should be
able to get another such prompt.  If so, KLDCP is running; go to step 1.
If you don't get the KLDCP prompt, continue here with 100a.

100a. Try restarting KLDCP: set 100014 in the PDP-11 switches (switches
15, 3 and 2 up, all the rest down); push HALT/ENABLE down and then back up;
push LOAD ADDRESS down and back up; press START.  KLDCP should
respond with a prompt; if so, go to step 1, else 100b.

100b. Try restarting KLDCP again, this time with 100004 in the address
switches (bits 15 and 2 up).  If you get the KLDCP prompt, go to step 1;
otherwise try one more starting address, namely 100010 (bits 15 and 3 up).
If this finally works, go to step 1, else go to 100c.

100c. If restarting KLDCP fails, KLDCP must be reloaded from DECtape.  (If
you reload it from DECtape, you will also HAVE to reload it subsequently
from disk, after WAITS is running -- see step 6.)  Make sure a DECtape
labelled "KL10 bootstrap" is mounted on the PDP-11 DECtape drive that is
selected to unit 0 and is enabled for "remote" (i.e., computer) operation.
Press the (LOAD) DECTAPE button (located above and to the left of the red
"Emergency Power Off" button) and hold it for at least a slow count to
one.  The DECtape should spin and eventually something like %4TCDP
monitor%1 should be typed.  Type in %4KLDCP%1 and [RETURN].  KLDCP should
load and type a message like %4Stanford KLDCP - QMP/EN%1.  If you don't
get to TCDP you might try pressing the (LOAD) DECTAPE button again.  If you
get to TCDP and the %4KLDCP%1 command doesn't work, get help.


.BEGIN INDENT 0
%2Start here after running any diagnostics, or after the power has been
off for the KL-10, or after verifying the microcode (step 140) following a
CRAM PARITY error.  Otherwise, don't come here unless directed to by the
steps above.%1
.END

105.  Reloading the KL10's microcode and configuring the memory.  This is
done by running a bootstrap sequence from the SAIL KLAD pack on the RP06
disk.  The KLAD pack should already be mounted, ready, and write enabled
on the RP06 disk drive.  Here's what to do to reload the microcode and
configure memory:

.BEGIN PREFACE 0; INDENT 0,6; SKIP; TABS 7; TURN ON "\";

105a.\Set the PDP-11 switches to zero, and then push the black button
labelled (LOAD) DISK.  (%6If the KLAD pack is mounted on an RP06 with a
unit 2 plug instead of a unit 0 plug, then put 1207 in the PDP-11 switches
and push (LOAD) SW/REG and then type %4NO%* and [RETURN] to the question
about entering the dialog.)%1

105b.\A program (RSX20F) will be loaded into the PDP-11 and
started.  It will type many things (it takes about a minute: be
patient).  It will finally say: %4KLI#--#CONFIGURATION FILE WRITTEN%*, at
which point the microcode has been loaded and the memory configured,
and you can go on.

105c.\Perform step 100c (to reload our KLDCP from DECtape) and then continue here.

105d.\Now type %4LD BOOT%* and [RETURN] (or LD NBOOT1 if reloading from the Ampex
disks for some reason)

105e.\Type %4DS%* and [RETURN].  DSKDMP should give its usual response of
%4DSKDMP%* and a star (%4*%1).  (%6If you instead get %4DEX ERROR
IN DS%*, then give the command %4EM 20%* and [RETURN] a couple of times.
If that types out a value from location 20, then now try %4DS%* and
[RETURN] again.  If it works, continue; if not, find help.%1)

105f.\Go to step 5 to finish reloading.
.END


.BEGIN INDENT 0
%2Don't come here unless directed to by the steps above.%1
.END

120.  Reloading KLDCP with the system already running, e.g., after step 5
is successful.  Sometimes the console-11's program, KLDCP, gets clobbered
and fails to work; this may be manifested by the failure of all attempts
from WAITS to reach other hosts via the Ethernet (since the PDP-11
contains the interface to the Ethernet).  Or you may see repeated messages
on the CTY saying %4?10 CMD ERR%1.  If KLDCP seems not to be working, you
can reload it while WAITS is running.  If the CTY is working (i.e.,
the console-11 is somewhat happy), then you can just type %411LOAD%* on
the CTY.  Normally the CTY is not usable with the system if KLDCP is not
happy, so a suitably privileged user must log in and incant either:

.BEGIN SELECT 4; no fill; no just; SKIP;
	11LOAD

%1or%*

	RUN 11LOAD[KL,SYS]
	AGRONK
	KLDCP.L11[KL,SYS]
.END

.END

.SKIP
%2Don't come here unless explicitly directed to by above instructions.%*

.BEGIN INDENT 5;
140. After a CRAM PARITY error, you should verify the microcode by doing
the following:
.END;

.BEGIN PREFACE 0; INDENT 0,6; SKIP; TABS 7; TURN ON "\";
140a.\Put 207 in the PDP-11 switches (or 1207 if KLAD pack is on unit 2).

140b.\Push the black button called (LOAD) SW/REG (above red power-off switch).

140c.\Respond to the prompt by typing YES and [RETURN].

140d.\Respond to the next prompt by typing VERIFY and [RETURN].
.END;

After you record any microcode parity errors listed, go to step 105 to
reload the microcode.

.SKIP
%2Don't come here unless explicitly directed to by above instructions.%*

.BEGIN INDENT 5;
200. Resetting hung memories.  If some error condition such as %4DEX ERROR
IN DS%1 or %4NXM%1 indicates that there is probably a hung memory, then
the memory needs to be reset.  There are two types of memories: the MG
memory (in two identical cabinets labelled, in the upper left corners, MGB
and MGA) and the ARM-10M memory (in one cabinet to the right of the two MG
boxes).  The three memory cabinets are located in the row behind the
KL-10.  Before resetting a memory, you should attempt to see if it is
hung.  This is done differently for the two different types of memory.
(If one or more of the three memory boxes has %2no lights on%*, then that
box has probably turned itself off -- in that case, find a wizard rather
than trying to fix it yourself!)

201. MGA and MGB: Each of these cabinets has an array of lights at the
top.  The bottom two rows in this array indicate the status of the two
controllers (cont 0 and cont 1) within each cabinet.  So there are four
controllers to check for being hung (or for having parity errors).  On
each controller's row of status lights, there is at the left end a light
labelled %4UA%1 (for Unit Available); if this light is out, the controller
is hung.  On the right end of each row of status lights is a light
labelled %4PAR ERR%1; if this light is on, then that controller has seen a
parity error.  If you notice a parity error, you should record it and also
record which of the %4RD%1 (read) and %4WR%1 (write) lights is on at the
other end of that row.  If you find a hung MG controller, you should do
the following to reset it:  (a) first push the RESET button on the front
of the C1 Disk Channel (next to the KL-10), and (b) then push the RESET
button on the bottom front of the MG that was hung (in each case, you must
open the magnetic door to get at the RESET button).  DO NOT RESET THE
MEMORY SIMPLY BECAUSE YOU FIND A PARITY ERROR LIGHT ON!  The parity error
light is simply a flag and does not affect memory operation.

202. ARM-10M: This memory has four HUNG lights at the bottom of the main
array of lights (visible through the window).  The four HUNG lights are
spread out, one for each sector, and each one is next to a RESET switch.
(You may not notice the lights if none is on, because of the dark
background, but you should see the word HUNG above a blank space where the
light really is, next to a RESET switch.)  The ARM-10M also has parity
error lights, in a line of four, one for each sector, labelled SECTOR
PARITY ERROR.  And just above those lights are four others labelled SECTOR
CONTROL ERROR.  Before resetting a hung memory, you should note whether
any of the parity error or control error lights are on; if any are on,
record in the log which one(s) they are.  To reset a hung sector, push the
RESET button next to the HUNG light that is on (you do NOT need to reset
the C1 before resetting the ARM-10M).  Again, NEVER RESET A MEMORY JUST
BECAUSE IT HAS A PARITY ERROR LIGHT ON!  You only need reset a memory if
it is actually hung.  If the ARM-10M is hung, you will end up having to
restart the KA-10 (after you get the system running again).

.END
**********************************************************************
.SKIP 1
%2RESTARTING THE KA-10:%1

.BEGIN INDENT 5
These instructions are for restarting the KA-10, which is the secondary
processor (P2).  They assume that the main timesharing system itself is
running; perhaps you are reading this because you were told to restart
the KA-10 by WAITS when you reloaded.  As always, make sure no wizard is
already working on it.

The KA-10 is the black computer (the KL-10 is blue).  Its console
panel should be about four feet to the left of this sheet.  It has
a lot of lights and switches on it.

To %2restart%1 the KA, first check the KA's address switches to be sure
that they are set to 204 (octal).  If you don't know how to do this, don't
worry since its switches should always be set to 204.  In addition, if the
KA stopped with the MEMORY STOP light on, make a log entry with details of
the memory lossage; the note on the KA console says how to do this.

Now, restart the KA by first pressing the RESET button on the KA-10
console panel and then pressing the START button.  In about 15 seconds,
you should see the bottom row of lights (above the long row of console
switches) counting upward.

If it doesn't look like the KA is working now, you may have to reload the
KA-10 (%2not%1 the regular system!!).

To %2reload%1 the KA, press the KA-10 RESET button again.  Then go to the
KL-10 CTY and type %4P2LOAD%1 and [RETURN].  When it finishes, press the
START button on the KA-10 console panel.  If this doesn't work now, get help.
.END

**********************************************************************
.SKIP 1
The PUB source for this file is %4HTSWTS.MRC[UP,DOC]%1.  Corrections
marked on this sheet will be noted therein.